text style transfer
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada (0.04)
- (2 more...)
Controllable Stylistic Text Generation with Train-Time Attribute-Regularized Diffusion
Zhou, Fan, Tian, Chang, Van de Cruys, Tim
Generating stylistic text with specific attributes is a key problem in controllable text generation. Recently, diffusion models have emerged as a powerful paradigm for both visual and textual generation. Existing approaches can be broadly categorized into classifier-free guidance (CFG) and classifier guidance (CG) methods. While CFG effectively preserves semantic content, it often fails to provide effective attribute control. In contrast, CG modifies the denoising trajectory using classifier gradients, enabling better attribute alignment but incurring high computational costs during sampling and suffering from classifier generalization issues. In this work, we propose RegDiff, a regularized diffusion framework that leverages attribute features without requiring a pretrained classifier during sampling, thereby achieving controllable generation with reduced computational costs. Specifically, RegDiff employs a VAE-based encoder--decoder architecture to ensure reconstruction fidelity and a latent diffusion model trained with attribute supervision to enable controllable text generation. Attribute information is injected only during training. Experiments on five datasets spanning multiple stylistic attributes demonstrate that RegDiff outperforms strong baselines in generating stylistic texts. These results validate the effectiveness of RegDiff as an efficient solution for attribute-controllable text diffusion. Our code, datasets, and resources will be released upon publication at https://github.com/xxxx.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada (0.04)
- (2 more...)
Evaluating Text Style Transfer: A Nine-Language Benchmark for Text Detoxification
Protasov, Vitaly, Babakov, Nikolay, Dementieva, Daryna, Panchenko, Alexander
Despite recent progress in large language models (LLMs), evaluation of text generation tasks such as text style transfer (TST) remains a significant challenge. Recent studies (Dementieva et al., 2024; Pauli et al., 2025) revealed a substantial gap between automatic metrics and human judgments. Moreover, most prior work focuses exclusively on English, leaving multilingual TST evaluation largely unexplored. In this paper, we perform the first comprehensive multilingual study on evaluation of text detoxification system across nine languages: English, Spanish, German, Chinese, Arabic, Hindi, Ukrainian, Russian, Amharic. Drawing inspiration from the machine translation, we assess the effectiveness of modern neural-based evaluation models alongside prompting-based LLM-as-a-judge approaches. Our findings provide a practical recipe for designing more reliable multilingual TST evaluation pipeline in the text detoxification case.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (11 more...)
Implementing Long Text Style Transfer with LLMs through Dual-Layered Sentence and Paragraph Structure Extraction and Mapping
This paper addresses the challenge in long-text style transfer using zero-shot learning of large language models (LLMs), proposing a hierarchical framework that combines sentence-level stylistic adaptation with paragraph-level structural coherence. We argue that in the process of effective paragraph-style transfer, to preserve the consistency of original syntactic and semantic information, it is essential to perform style transfer not only at the sentence level but also to incorporate paragraph-level semantic considerations, while ensuring structural coherence across inter-sentential relationships. Our proposed framework, ZeroStylus, operates through two systematic phases: hierarchical template acquisition from reference texts and template-guided generation with multi-granular matching. The framework dynamically constructs sentence and paragraph template repositories, enabling context-aware transformations while preserving inter-sentence logical relationships. Experimental evaluations demonstrate significant improvements over baseline methods, with structured rewriting achieving 6.90 average score compared to 6.70 for direct prompting approaches in tri-axial metrics assessing style consistency, content preservation, and expression quality. Ablation studies validate the necessity of both template hierarchies during style transfer, showing higher content preservation win rate against sentence-only approaches through paragraph-level structural encoding, as well as direct prompting method through sentence-level pattern extraction and matching. The results establish new capabilities for coherent long-text style transfer without requiring parallel corpora or LLM fine-tuning.
Towards Interpretable Soft Prompts
Patel, Oam, Wang, Jason, Nayak, Nikhil Shivakumar, Srinivas, Suraj, Lakkaraju, Himabindu
Soft prompts have been popularized as a cheap and easy way to improve task-specific LLM performance beyond few-shot prompts. Despite their origin as an automated prompting method, however, soft prompts and other trainable prompts remain a black-box method with no immediately interpretable connections to prompting. We create a novel theoretical framework for evaluating the interpretability of trainable prompts based on two desiderata: faithfulness and scrutability. We find that existing methods do not naturally satisfy our proposed interpretability criterion. Instead, our framework inspires a new direction of trainable prompting methods that explicitly optimizes for interpretability. To this end, we formulate and test new interpretability-oriented objective functions for two state-of-the-art prompt tuners: Hard Prompts Made Easy (PEZ) and RLPrompt. Our experiments with GPT-2 demonstrate a fundamental trade-off between interpretability and the task-performance of the trainable prompt, explicating the hardness of the soft prompt interpretability problem and revealing odd behavior that arises when one optimizes for an interpretability proxy.
A Meta-Evaluation of Style and Attribute Transfer Metrics
Pauli, Amalie Brogaard, Augenstein, Isabelle, Assent, Ira
LLMs make it easy to rewrite text in any style, be it more polite, persuasive, or more positive. We present a large-scale study of evaluation metrics for style and attribute transfer with a focus on content preservation; meaning content not attributed to the style shift is preserved. The de facto evaluation approach uses lexical or semantic similarity metrics often between source sentences and rewrites. While these metrics are not designed to distinguish between style or content differences, empirical meta-evaluation shows a reasonable correlation to human judgment. In fact, recent works find that LLMs prompted as evaluators are only comparable to semantic similarity metrics, even though intuitively, the LLM approach should better fit the task. To investigate this discrepancy, we benchmark 8 metrics for evaluating content preservation on existing datasets and additionally construct a new test set that better aligns with the meta-evaluation aim. Indeed, we then find that the empirical conclusion aligns with the intuition: content preservation metrics for style/attribute transfer must be conditional on the style shift. To support this, we propose a new efficient zero-shot evaluation method using the likelihood of the next token. We hope our meta-evaluation can foster more research on evaluating content preservation metrics, and also to ensure fair evaluation of methods for conducting style transfer.
- Asia > China (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- (10 more...)
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators
Moskovskiy, Daniil, Sushko, Nikita, Pletenev, Sergey, Tutubalina, Elena, Panchenko, Alexander
Existing approaches to multilingual text detoxification are hampered by the scarcity of parallel multilingual datasets. In this work, we introduce a pipeline for the generation of multilingual parallel detoxification data. We also introduce SynthDetoxM, a manually collected and synthetically generated multilingual parallel text detoxification dataset comprising 16,000 high-quality detoxification sentence pairs across German, French, Spanish and Russian. The data was sourced from different toxicity evaluation datasets and then rewritten with nine modern open-source LLMs in few-shot setting. Our experiments demonstrate that models trained on the produced synthetic datasets have superior performance to those trained on the human-annotated MultiParaDetox dataset even in data limited setting. Models trained on SynthDetoxM outperform all evaluated LLMs in few-shot setting. We release our dataset and code to help further research in multilingual text detoxification.
- Asia > Russia (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- Europe > Austria > Vienna (0.14)
- (20 more...)